摘要 :
The data warehousing approach intends to exploit a very large volume of data to make relevant decisions. In this paper, we deal with object-oriented data warehouse design. More precisely, we present an object-oriented data warehou...
展开
The data warehousing approach intends to exploit a very large volume of data to make relevant decisions. In this paper, we deal with object-oriented data warehouse design. More precisely, we present an object-oriented data warehouse model, integrating temporal and archive data. We provide functions allowing the administrator to specify a data warehouse from a global source schema.
收起
摘要 :
Many models have been proposed for modeling multidimensional data warehouses and most consider a same function to determine how measure values are aggregated according to different data detail levels. We provide a conceptual model...
展开
Many models have been proposed for modeling multidimensional data warehouses and most consider a same function to determine how measure values are aggregated according to different data detail levels. We provide a conceptual model that supports (1) multiple aggregations, associating to the same measure a different aggregation function according to analysis axes or hierarchies, and (2) differentiated aggregation, allowing specific aggregations at each detail level. Our model is based on a graphical formalism that allows controlling the validity of aggregation functions (distributive, algebraic or holistic). We also show how conceptual modeling can be used, in an R-OLAP environment, for building lattices of pre-computed aggregates.
收起
摘要 :
Many models have been proposed for modeling multidimensional data warehouses and most consider a same function to determine how measure values are aggregated according to different data detail levels. We provide a conceptual model...
展开
Many models have been proposed for modeling multidimensional data warehouses and most consider a same function to determine how measure values are aggregated according to different data detail levels. We provide a conceptual model that supports (1) multiple aggregations, associating to the same measure a different aggregation function according to analysis axes or hierarchies, and (2) differentiated aggregation, allowing specific aggregations at each detail level. Our model is based on a graphical formalism that allows controlling the validity of aggregation functions (distributive, algebraic or holistic). We also show how conceptual modeling can be used, in an R-OLAP environment, for building lattices of pre-computed aggregates.
收起
摘要 :
NoSQL document stores offer native support to efficiently store documents with different schema within a same collection. However, this flexibility made it difficult and complex to formulate queries or to manipulate collections wi...
展开
NoSQL document stores offer native support to efficiently store documents with different schema within a same collection. However, this flexibility made it difficult and complex to formulate queries or to manipulate collections with multiple schemas. Hence, the user has to build complex queries or to reformulate existing ones whenever new schemas appear in the collection. In this paper, we propose a novel approach, grounded on formal foundations, for enabling schema-independent queries for querying and maintaining multi-structured documents. We introduce a query reformulation mechanism which consults a pre-constructed dictionary. This dictionary binds each possible path in the documents to all its corresponding absolute paths in all the documents. We automate the process of query reformulation via a set of rules that reformulate most document store operators, such as select, project and aggregate. In addition, we automate the process of reformulating the classical manipulation operators (insert, delete and update queries) in order to update the dictionary according to the different structural changes made in the collection. These two processes produce queries which are compatible with the native query engine of the underlying document store. To evaluate our approach, we conduct experiments on synthetic datasets. Our results show that the induced overhead when querying or updating can be acceptable when compared to the efforts made to restructure the data and the time required to execute several queries corresponding to the different schemas inside the collection.
收起
摘要 :
NoSQL document stores offer native support to efficiently store documents with different schema within a same collection. However, this flexibility made it difficult and complex to formulate queries or to manipulate collections wi...
展开
NoSQL document stores offer native support to efficiently store documents with different schema within a same collection. However, this flexibility made it difficult and complex to formulate queries or to manipulate collections with multiple schemas. Hence, the user has to build complex queries or to reformulate existing ones whenever new schemas appear in the collection. In this paper, we propose a novel approach, grounded on formal foundations, for enabling schema-independent queries for querying and maintaining multi-structured documents. We introduce a query reformulation mechanism which consults a pre-constructed dictionary. This dictionary binds each possible path in the documents to all its corresponding absolute paths in all the documents. We automate the process of query reformulation via a set of rules that reformulate most document store operators, such as select, project and aggregate. In addition, we automate the process of reformulating the classical manipulation operators (insert, delete and update queries) in order to update the dictionary according to the different structural changes made in the collection. These two processes produce queries which are compatible with the native query engine of the underlying document store. To evaluate our approach, we conduct experiments on synthetic datasets. Our results show that the induced overhead when querying or updating can be acceptable when compared to the efforts made to restructure the data and the time required to execute several queries corresponding to the different schemas inside the collection.
收起
摘要 :
The traditional OLAP (On-Line Analytical Processing) systems store data in relational databases. Unfortunately, it is difficult to manage big data volumes with such systems. As an alternative, NoSQL systems (Not-only SQL) provide ...
展开
The traditional OLAP (On-Line Analytical Processing) systems store data in relational databases. Unfortunately, it is difficult to manage big data volumes with such systems. As an alternative, NoSQL systems (Not-only SQL) provide scalability and flexibility for an OLAP system. We define a set of rules to map star schemas and its optimization structure, a precomputed aggregate lattice, into two logical NoSQL models: column-oriented and document-oriented. Using these rules we analyse and implement two decision support systems, one for each model (using MongoDB and HBase).We compare both systems during the phases of data (generated using the TPC-DS benchmark) loading, lattice generation and querying.
收起
摘要 :
The traditional OLAP (On-Line Analytical Processing) systems store data in relational databases. Unfortunately, it is difficult to manage big data volumes with such systems. As an alternative, NoSQL systems (Not-only SQL) provide ...
展开
The traditional OLAP (On-Line Analytical Processing) systems store data in relational databases. Unfortunately, it is difficult to manage big data volumes with such systems. As an alternative, NoSQL systems (Not-only SQL) provide scalability and flexibility for an OLAP system. We define a set of rules to map star schemas and its optimization structure, a precomputed aggregate lattice, into two logical NoSQL models: column-oriented and document-oriented. Using these rules we analyse and implement two decision support systems, one for each model (using MongoDB and HBase).We compare both systems during the phases of data (generated using the TPC-DS benchmark) loading, lattice generation and querying.
收起
摘要 :
Monitoring and analyzing sensor networks is essential for exploring energy consumption in smart buildings or cities. However, the data generated by sensors are affected by various types of anomalies and this makes the analysis tas...
展开
Monitoring and analyzing sensor networks is essential for exploring energy consumption in smart buildings or cities. However, the data generated by sensors are affected by various types of anomalies and this makes the analysis tasks more complex. Anomaly detection has been used to find anomalous observations from data. In this paper, we propose a Pattern-based method, for anomaly detection in sensor networks, entitled CoRP "Composition of Remarkable Point" to simultaneously detect different types of anomalies. Our method detects remarkable points in time series based on patterns. Then, it detects anomalies through pattern compositions. We compare our approach to the methods of literature and evaluate them through a series of experiments based on real data and data from a benchmark.
收起
摘要 :
Monitoring and analyzing sensor networks is essential for exploring energy consumption in smart buildings or cities. However, the data generated by sensors are affected by various types of anomalies and this makes the analysis tas...
展开
Monitoring and analyzing sensor networks is essential for exploring energy consumption in smart buildings or cities. However, the data generated by sensors are affected by various types of anomalies and this makes the analysis tasks more complex. Anomaly detection has been used to find anomalous observations from data. In this paper, we propose a Pattern-based method, for anomaly detection in sensor networks, entitled CoRP "Composition of Remarkable Point" to simultaneously detect different types of anomalies. Our method detects remarkable points in time series based on patterns. Then, it detects anomalies through pattern compositions. We compare our approach to the methods of literature and evaluate them through a series of experiments based on real data and data from a benchmark.
收起
摘要 :
Context-aware recommender systems (CARS) rest on a multidimensional rating function: Users × Items × Context → Ratings. This multidimensional modelling should improve the quality of the recommendation process, but unfortunately...
展开
Context-aware recommender systems (CARS) rest on a multidimensional rating function: Users × Items × Context → Ratings. This multidimensional modelling should improve the quality of the recommendation process, but unfortunately, it is rare or even impossible to have ratings for all possible cases of context. Our objective is therefore twofold: (ⅰ) to reduce the dimensionality of the contextual information (in order to reduce the sparsity), which leads to (ⅱ) propose a technique for aggregating the ratings associated with the aggregated dimensions. To do this, we organize, in the CARS utility matrix, the contextual information according to hierarchical dimensions as is done in OLAP (OnLine Analytical Processing) and we use a regression-based approach for the rating aggregation according to previously defined hierarchies. Our approach supports multiple dimensions and hierarchical aggregation of ratings. It was validated on two real world datasets.
收起